Support Vector Machine for Text Categorization
نویسندگان
چکیده
In this paper, we studied the problem of classifying spam e-mail vs nonspam e-mail under the general frame work of text categorization. First we applied the regular support vector machine to this problem, and tuned up the parameters. We observed that the classification accuracy and generalization ability of the SVM classifier can be controlled, via the bounds, and the ratio of the bounds and kernels. We also observed that “stop list” is not a good preprocessing nethod for e-mail messages. We also developed and tested the new idea of Logical Support Vector Machine(LSVM) on this task. LSVM yields better results than regular SVM. Our results showed SVM and LSVM are an excellent methods for e-mail classification.
منابع مشابه
Improving the Operation of Text Categorization Systems with Selecting Proper Features Based on PSO-LA
With the explosive growth in amount of information, it is highly required to utilize tools and methods in order to search, filter and manage resources. One of the major problems in text classification relates to the high dimensional feature spaces. Therefore, the main goal of text classification is to reduce the dimensionality of features space. There are many feature selection methods. However...
متن کاملText Categorization and Support Vector Machines
Text categorization is used to automatically assign previously unseen documents to a predefined set of categories. This paper gives a short introduction into text categorization (TC), and describes the most important tasks of a text categorization system. It also focuses on Support Vector Machines (SVMs), the most popular machine learning algorithm used for TC, and gives some justification why ...
متن کاملUsing Bag-of-Concepts to Improve the Performance of Support Vector Machines in Text Categorization
This paper investigates the use of conceptbased representations for text categorization. We introduce a new approach to create concept-based text representations, and apply it to a standard text categorization collection. The representations are used as input to a Support Vector Machine classifier, and the results show that there are certain categories for which concept-based representations co...
متن کاملText categorization using topic model and ontology networks
Text categorization based on pre-defined document categories is one of the most crucial tasks in text mining applications in recent decades. Successful text categorization highly relies on the text representations generated from documents. In this paper, an innovative text categorization model, VSM_WN_TM, is presented. VSM_WN_TM is a special Vector Space Model (VSM) that incorporates word frequ...
متن کاملA Novel Text Categorization Approach based on K-means and Support Vector Machine
Continuous expansion of digital libraries and online news, the huge amount of text documents is existing on the web. Consequently the need is to organize them. Text Categorization is an active analysis field can be used for organizing text document. Text categorization is the process of assigning documents with predefined categories that are associated with their contented. CAWP algorithm is de...
متن کاملSupport Vector Machine Parameter Optimization for Text Categorization Problems
This paper analyzes the influence of different parameters of Support Vector Machine (SVM) on text categorization performance. The research is carried out on different text collections and different subject headings (up to 1168 items). We show that parameter optimization can essentially increase text categorization performance. An estimation of range for searching optimal parameter is given. We ...
متن کامل